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(54)11116: SUBSTRING SEARCHING METHOD 



(Sl)Abstnet 

A substring searching method is disclosed for locating within a 
character string (1 1), a substring that matches a given character pat- 
tern (12) of one or more characters, the character making up the 
string and panem being taken from the same character set Ihe meth- 
od involves a preliminaiy phase in which a respective possible-match 
record (31) is formed for each character of the character set, this re- 
cord indicating for each of the N last positions as the pattern is no- 
tionally moved up to the character associated with the record, to align 
the start of the pattern with the character, whether the diaracter 
matches the corresponding character of the pattern. During a subs^ 
quent phase of the substring search method, the pattern (12) is notion- 
ally shifted along the string t>eing searched (1 1) and at each position, a 
test is carried out one character at a time, to ascertain if the substring 
aligned with the pattern (12) matches the latter; whenever a character 
test indicates a mismatch, the pattern is shifted to a new position. The 
testing for a character match is effected by combining the possible 
match record (31) of a character of the substring being examined. 
^lih a current shift mask (50) that indicates the current state of 
knowledge concerning possible match positions, this combining op- 
eration involving fonning an intermediate translated mask (51). The 
resultant new current shift mask (52) not only indicated whether a 
match is possible at the current position in view of the presence of the 
exammed character, but also the amount by which the pattern should 
be shifted if a mismatch is indicated. 
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CTTTCTRTMfi SEARCHTNG MEtHQP 

Technical Field 

S TlKe present in venticKi relates to substring searching and , more particularly, to a method 
of locating within a character string a substring that matches a given character pattena 
of one or more characters, the characters making up the string and pattern being taken 
from the same character set 

10 The charactCT set may, for example, be the English alphabet; in this case the givai 
character pattern may be an English word it is desired to locate in a body of text 
constituting the character string. More generally, the characto' set could be aD 
possible 256 combinations of 8 binary quantities (ic 8-bit bytes), substring searching 
bang used to find occurrences of given byte patterns within a body of computer data. 

IS Altemativdy , the character set could simply be the binary character set of 0, 1 in which 
case substring searching becomes seardiing for a given binary sequence within a binary 
string. 

fiackccQund^ 

20 

Substring searching has widespread uses in information processing ranging from pattern 
matching and feature recognition to text processing. As a result, there is a substantial 
body of prior art relevant to such searching. 

25 The simplest algorithm for locating the occurrences of a pattern within a text will be 
called the "naive algorithm" here. Imagine the text as a line of data to be scanned 
linearly from left to right. The text has to be checked for a complete occurrence of 
the pattern at aU possible positions. In the naive algorithm the pattern is first checked 
with its leftmost end aligned with the leftmost end of the text. Each character in the 

30 pattern, starting with the leftmost and proceeding through the pattern to the right one 
character at a time, is compared with the character in the corresponding position in the 
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text Whenever a position is found in which the pattern and text charactCB do not 
maich, then the algorithm has detennined that the pattern does not occur at this 
alignment in the tert, and the pattern is moved one character to the right against the 
text. A new comparison of each pattern character with the coneqwndiAg text 
5 character is then begun, in order to chedc for the pattern at this new position in the 
text. If the end of the pattern is reached after all characters have been found to match 
then the algorithm has successfully located an occurrence of the pattern in the text If 
the end of the text is reached without finding an occurrence of the pattern, then the 
algorithm has determined that the pattern does not occur in the text This naive 
10 algorithm is concq)tually ample but inefficient 

It will, of course, be appreciated that in the naive algorithm, searching could equally 
as wdl have been effected ftom right to left as from left to right This is generally the 
case for substring searching medKids and references heron to searching lefhwaids or 
15 rightwards should be taken as by way of examide rather than by of Hmil^ 
excqit uisoftr as these references imply a retative direction of execution of assodaled 
actions. 

Knrth et al described a more efficient algoritirai which will be called flie "KMP" 
20 algoritfun here (see: Knuth,DJE.,MorrisJ.H. and Pratt,V.R. Fast pattern matching 
in strings, Siam J Comput Vol 6 No 2, June 1977, pp 323-350.) 
TTie KMP algoritiun diffisrs fix>m tiie naive algoritimi in exploiting tiie feet fliat if in 
comparing tiie pattern witij the text in a given position a number of characters are 
found to match before a mismatch is found, then tiie pattern of known text characters 
25 tiial have already been matched restricts the positions of pattern occurrences in tiie text 
that are possible. TTie KMP algoritiim searches for tiie pattern in tiie text firom left to 
right, and scans characters in tfie pattern from left to right one character at a time. As 
flie scannmg order is defined, tiie algoritiun can, in a preliminary phase, precompute 
from tiie pattern a table of pattern-shift data tiiat indicates for each position in tiie 
30 pattern at which a mismatch between tiie pattern and tiie text is first found tiie number 
of characters by which tiie pattern should be shifted to tiie right in order to find tiie 
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next possible occurrence of the pattern in the text. If the text is sufficiently long, the 
fixed overhead of precomputing the table is offset by the increased rale of testing 
possible positions for occurrence of the pattern in the main phase of the algoridun. 
TTie KMP algorithm is more effident tiian the naive algorithm because it uses implidt 
5 information about the previously scanned characters to allow the pattern to be shifted 
more than one character at a time. 

Boyer and Moore have also described an algorithm that uses the idea of precomputing 
tables of pattern-shift data in a prelimmary phase of tiie algorithm, and tiiat uses 

10 information about the text implidt in die number of characters already matched 
between die text and pattern in die current pattern position (see: Boyer,R.S. and 
MooreJ.S. A fast string searchmg algoridim. Coramun ACM 20, 10 (Oct 1977), pp 
762-772.). Boyer and Moore's algorithm, called die "BM" algoritiim here, dififered 
from die KMP algoridim in diat die BM algoritiim scans characters in tfie pattern from 

15 right to left, while still searching the text for pattern matches from left to righL llie 
result of flus simple difference is dearest if die first text character tested at a particular 
pattern position in the text happens to be a character diat does not occur in the pattern 
at all. In die KMP algoritiim die tested character is at the left hand end of die pattern, 
and die pattern is shifted one character to die right. In flie BM algoritiim die tested 

20 character is at die right hand end of die pattern, and the pattern is shifted to die right 
by die numbCT of characters in die pattern. Any smaller shift of die patt«n would 
place one of ttie characters in die pattan against die tested text character, which does 
not occur in die pattern. In general, in die BM algoridim die patiem is shifted by 
using die text character diat . mismatched die pattern to index a precomputed table diat 

25 gives die minimum pattern shift compatible widi die value of die mismatdied text 
character. Optionally a second precomputed table can be used to give die minimum 
pattern shift compatible widi die fact diat pattern characters previously tested in die 
current pattern position matched die text (and so diese characters in die text must match 
any occurrence of die pattern in anodier position). If botii precomputed tables are used 

30 dien die pattern can be shifted by die maximum of die shifts indicated by die two 
tables. The BM algoritiim is more effident dian die KMP algoridim because it uses 
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a right to left scanning older, which on average results in the pattern being shifted by 
a larger numbo- of diaracters on each character nusmatch found. 

Knuth et alln a postscript to his above-referenced artide (on page 346) described a 
5 generalisation of the BM algorithm, which we will call the •BMKMP" algorithm. 
Knutfi et al observed that wbea die BM algorithm shifts die pattern over die text, it 
•ftMgets" all diat it already "knows" about die diaracters diat have already been 
match .d. They proposed die BMKMP algoridmi which retains all of die knowledge 
of die text tiiat is stin lelerant to die comparison process by encoding flus knowdedge 

10 into one of a finite number of stales. Each state represents one of die possible 
combinations of known text characters at die current pattern position. Hie algoridim 
state determines die position of die next text character to be tested, and die value of 
diat text diaiacter and die current state togedusr determine die next stale and by how 

many characters die pattern should be drifted against die text This algoridim 
15 conqiareswdlwidiodiers in diat it requires a low number of character comparisons 
between die pattern and die text However, several fectors make die BMKMP 
algorithm unattractive in practice :- 

1. The stale table is typicaUyndher large. If all diaracters in die pattern are 
20 distinct, dientiiere are (m*2 + m)/2 states, where diere are m characters in die pattern 
and """denotes ejqwnentiation. Worse, ifaU characters arc not distinct dien die only 
dear bound established on die size of die state table is 2"m states, which could be too 
large to implement in many cases. 

25 2. Each state in die state table needs entries to define how die algoridim 

proceeds according to die character value found in die text An entry is needed fin- 
each distinct character found in die pattern, and for all odier characters togedier. The 
state table entries can be sttuctured ddier to allow fest access to die entry for die 
character found m die text, or to allow efficient use of die memoiy used to store die 

30 state table. However, die aims of fast access and efficient use of memory are in 
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conflict to some extent, and this wiU make the BMKM? algorithm more expensive to 
implement than some of the other algorithms described here. 

3. The stale table must be computed ftom the pattern before the algorithm 
5 can begin searching the text The size and complexly of the state table make this 
pre-computation an eacpensive operation, which could only be justified if a loqg text 
was to be searched. 

Sunday has described a femily of substring searching algorithms (called the "Sunday" 

10 algorithms here) which are also related to the BM algorithm (see: Sunday,D.M. "A 
very fest substring search algoritiim" Commun ACM 33, 8 (Aug 1990), pp 132-142.), 
These algorithms differ from each other in the order in which the characters in the 
pattern are compared against the text, and these different comparison orders also result 
in different requirements for precomputing the tables of shift values. The BM 

15 algorithm uses one precomputed table to obtain a minimum value of pattern shift ftom 
Ae value of die text character that mismatdied, and another to obtain a minimum value 
of pattern shift from the number of pattern characters successfully matched against tfie 
text at the cunent pattern position. In contrast, Sunday's algorithms use one 
precomputed table to obtain a minimum value of pattern shift ftom all of the pattern 

20 characters tested against the text at the current pattern position, including the 
mismatched character (although the value of text character found at the mismatch 
position is not used.) Once a mismatch is found, Sunday's algoridims read the text 
character immediatdy following the end of the pattran in its current position, and use 
this text character to access a second precomputed table which gives a second minimum 

25 value of pattern shift As in the BM algorithm, the pattern is then shifted by the 
maximum of the two minimum pattern shift distances. These changes ftom the BM 
algorithm have the effect of allowing the characters in the pattern to be tested in any 
order, which allows Sunday to generate a family of substring searching algorithms as 
described. The different orders of testing the pattern characters result in different 

30 efficiencies. However, there are two aspects of all of Sunday's algorithms that result 
in some loss of theoretical efficiency :- 
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1. Like the BM algorithm. Sunday's algorithms "fbrgef everything they 
•know' about the text whenever they shift the pattern along the text TTie forgotten 
information includes both the text characters compared with the pattern at its last 
position, and the value of the text character which immediately followed the end of the 

5 pattern in its last position and which was tested to determine the size of pattern shift 

2. Also Dke the BMalgoriflmi, Sunday's algorithms shift tiie pattern by the 
larger of t«vo separatdy determined minimum shift values. 

10 K is an dbjcCt of the present invention to provide a sub-string search method that 
overcomes at least certain of (b& drawbacks of the prior art methods. 



piselosuie of fte Invention 

15 According to one aspect of the present invention, there is provided a method of 
locating within a duiiacter string a substring that matches a given chanicter pattern 

one or more characters, the characters making up the string and pattern bang taken 
from the same character set and said method comprising a preBminary phase during 
which pattern-shift data is derived ftom said given pattern, and a main phase in which 
20 the pattern is notionafly positioned relative to said string at a succession of different 
current pattern positions in each of which one end of the pattern corresponds to a 
respective character position in said string and said pattern extends along the string 
thereftxmi in a predetermined direction, the main phase including the Steps of : 

(a) cheddng with the pattern in its current pattern position, for any mismatch 
25 between the pattern and the corresponding substring of said string, this step 

being first carried out with the pattern in an initial current pattern position in 
which said one end of the pattern corresponds to one end of the string, and 

(b) where stq) (a) ascertains a mismatch, notionaUy shifting the pattern along the 
string in said predetermined direction, by an amount derived ftom said pattem- 

30 shift data, to a new current pattern position. 
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Steps (a) and (b) being rqieated as necessary until a natch is found or the stnng has 
been fuUy searched; characterised in that: 

- said pattem-shift data is in the form of a respective possible-match record for each 
of a plurality of different character items of said character set where a character item 

5 may comprise a single character of said character set and/or combinations of such 
diaracters and said plurality of items is such as to enableany possible string to be built 
up from combinations, including repeats^ of said character items, 

- each possible-match record indicates, for an assumed situation of the a ssodated said 
character item being present in the string and said one end of the pattern being OKyved 

10 in said predetennined direction towards the character item, whether said character item 
matches the corresponding character or characters of said pattern for each of the N last 
pattern positions taken by said pattern as its said one end is moved up to said character 
itCTi, N bring a positive integer, and 

- the main phase involves determining, from the respective possible-match record of 
IS at least one character item of die substring that corresponds to said pattern in its 

current position, the next pattern position at which the presence of Ae, or all of flie, 
said at least one characto' items is compatible with a match between the pattm and a 
substring of the string, this determination taking account of the position in said string 
of the or each said at least one character item relative to said one end of the pattern, 
20 by using an appropriate portion of the character item's possible^match record. 

The possible-match records thus permit information on possible future matches derived 
ftom one or more diaracter items to be utilised in determining the next pattern 
position. The use of possible-match records means that the order in which character 
25 items in the string are tested against the pattern can be varierl. The possibility of 
combining possible-shift information derived in respect of several character items 
makes the method potentially powerful. 

The said character items will often be constituted exclusively by all the single 
30 characters of the character set. However, in certain circumstances (for example, in 
searching strings made up from the binary character set of 0,1), it may be 
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advaniageous to use-combinations of characters; thus for the binaiy character set case, 
the character items could be constituted by all possible 256 combinations of dght 
binary characters. 

5 GeneraUy.stq) (a) of the main phase involves identifying the character ite^ 

at a particular location in tiie substring cunentty conesponding to said pattern, and 
checking whedier tins character item matches flie character or chancten in die 
conesponding location in die pattern, tiiese operations being repeated for different 
locations in said substring until eidier a mismatch is found or die whole substring has 

10 been successfoUyinatched to die pattern. According to a preferred i^ 
theinvention.st«9) (a) also mchidesptovidingacurrent possible 
a cumulative indication of which pattern positions yet to be reached, provide die 
possibility of a match, tiiis cumulative indication being provided by die incorporation 
into said indicator, for die or each character item identified, of die rdcvant portion of 

15 itspossiWmatdirecotd,sddcurientpossiblc-matdiindicalorbeii« 

said next pattern position to which diepaltern position is shifted in st^ (b) of ft^ 



In diis manner, a cmnulative indication of die next possible match position is built up 
20 torn possible^matchinfonnation relevant to each character item tested in die cur^ 
pattern position; in die event of a mismatch, die next pattern position to try is readily 
ascertained. 

Advantageously, die said cunent possible-match indicator includes a possible match 
25 indication for die current pattern position as weU as for pattern positions yet to be 

reached, dus indication being cumulatively derived from die possible-match records of 
die or each character item identified for checking in step (a) of die main phase. The 
incorporation of diis information into die cunent possible-match indicator, means diat 
dus indicator can not only be used to detemiine die next pattern position should a 
30 mismatch be found at die cunent position, but also to actuaUy check for a mismatch 
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at the current position in respect of each character itemidentified for checking. This 
leads to an efficient implementation of the present method. 

Upon the pattern being shifted to a new current pattern position in stqp (b) of the main 
5 phase, the aforesaid said current possible-match indicator can either be reset initially, 
to indicate all pattern positions not yet passed are possible match positions), or 
modified to retain only the portion thereof relevant to the pattern positions not yet 
passed. 

10 At a low level, each said possible-match record and said current possible-match 
indicator preferably take the form of an N-bit field in which each bit position indicates 
whether a match is possible for a respective pattern position, the updating of said 
current possible-match indicator being effected by bit shifting and/or logically 
combining the relevant bits of a possible-match records with corresponding bits of said 

15 indicator. 

Generally, the integer N will have a value equal to the character length of said pattern. 
However, it is also possible for N to have a value greater or less than die pattern 
length (the latter being required, for example, where the possible-matdi records are 
20 held in 32-bit memory locations and the pattern is more tiian 32 characters long). 

Ppgf PgscriptiQT^ of thg Pra^ngs 

A substring search method according to the present invention will now be particularly 
25 described, by way of non-limiting example, with reference to the accompanying 
diagrammatic drawings, in which: 

Figure 1 is a diagram illustrating a predetermined pattern and a text string to be 
searched for occurrences of the pattern, the diagram being used to 
elucidate certain basic terms used in describing the substring search 
30 method; 
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Rguie 2 is a diagram siinilar to Figure 1 but elucidating fiirtlier tenns used in 

describing tfie stib!tt""g search mediod; 
Figure 3 is a diagram illustiaiing die infiannation content of sliift masks used by 

tbe substring seaidi m^od; 
5 Egurc4 is a diagram illustraling die use of flie sliift masks to generate a 

cumulative cunent shift mask used by die substring search mediod; 
Figure 5 is a top levd flow diagram of die substring seaidi mediod; 
Figure6 is a flow diagram of a shift-mask generation stq) of die Figure 5 flow 

diagram; and 

10 Rgure? is a flow diagram of a pattern shift calculation stq) of die Figure 5 flow 
diagram. 

Bpsf Mnde for Cairvint^ Oiit tire Invention 

15 In die substring search medmd now to be described, it is to be understood diat die 
string (heremafter refcned to as die "tejct") to be searched ft)r occurrences of d» given 

pattern is hdd in a memory, which wiH be referred to as die "text memory". 
Referring to Figure 1, bodi die tart 11 and pattern 12 are ordered sequences of 
characters, which we wifl describe as starting at die leftmost character and ending at 

20 die rightmost character On die Rgures, die characters of die text are generafly 
iqiresented by dots for dari^). The present substring search mediod scans die tart 
ftom left to light, searching for die leftmost occurrence of die given pattern. If die 
pattern is not present in die text, dusmformation is returned. If an occurrence of die 
pattern is found, dien die search mediod can be reapplied to die text to find furdier 

25 occurrences of die given pattern ifdiis is required. The substring search mediod tests 
for occurrence of die pattern widi die leftmost character of die pattern aligned widi die 
leftmost character of die text before any otiier alignment is tested. The position 13 of 
die text character tiiat is cuirenfly aUgned widi die leftmost character of die pattern 12 
during die operation of die search mediod wiU be described as die current "pattern 

30 position". 
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If the patteni contains M characters Qn Figures 1 to 4,'^=8), then at each pattern 
positicm the M charactcn of the pattern are aligned with an M-character substring of 
the text (caUed the "current text range' 14 here). If and only if each of the M pattern 
characten is equal in value to the text character with which it is aligned does the 
5 pattern position indicate an occurrence of the pattern in tiie text. If the method is to 
confirm the occurrence of the pattern at the current pattern position, Aen each of the 
pattern characters must be tested for equality with the text character with wfaidi it is 
aligned. In the present substring seardi method, tiie ?rder (called tbc "pattern test 
sequence" here) in which the characters of die current text range are tested does not 
10 affect correct operation, and so diis order can be chosen for maximum efSdency, and 
even varied throughout the operation of the method. A simple and reasonably eEGsctive 
choice of pattern test sequence is to test the characters in the current text range firom 
rightmost to leftmost in sequence. 

15 In the present sub^ring search method, the comparisons between the text diaracteis 
and the pattern characters are not made directly. Instead, tiie pattrai is processed to 
provide a table of possible-match records each of which indicates for a respective text 
character whether the character matches with the corresponding pattern charactff for 
all possible relevant positions of pattern occurrences relative to that particular text 

20 character. This precomputed table and tfie text are then used to locate occurrences of 
tiie pattern in the text, the pattern itself not being required during the search. The table 
will be described as flie "shift mask table" here and its possible-match records will be 
referred to as "shift masks". The shift mask table will now be described in more d^. 

25 At any givm pattern position, text characters tiiat arc to the left of the current text 
range have already been passed over by the pattern, and arc no longer of interest Text 
characters that are to tiie right of tiie current text range have not yet been reached by 
the pattern, and do not affect the operation of the search method at tiie current pattern 
position. Only text characters witiiin tiie current text range will determine ttie 

30 operation of the search metiiod at tiie current pattern position. 
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Ccmviasely, a particolar tert character (that is, a character at a particular posilion 
within the text string) will only affect the operation of the search method at pattern 
positions fiir which that particular text character is aligned with a character of the 
pattern. This can be seen with reference to Figuns 2 by considering the idalionshq) 
5 between a particular text character 21 of the text 11 and the pattern 12 of M characters 
(note flat the text character 21 has, fOT reasons of dari^, been n^resented in the tc^ 
11 as a it bang understood that the character may take on the character value of 
any of the cbaractos making up die text). 

10 Tlxe pattern 12 is inustnited in two pattern positions in Rgure 2, fliese being a lef^ 
pattern poation, indicated by arrow 22L, in which the rightmost patten characto 

in alignment with die text diaracter 21, and a rightward pattern position, indicated by 
arrow 22R, which is ti>e position ofthe text character 21. ttwiflbereadflyapprBdalBd 
that the pattern positions for wMch fl» text character win affect the qjc^^ 
15 search mefliodareflieMconsecativepaiternpositionsstartingwiflifliepatteniposition 
incficated by arrow 22L and ending with the pattern position indicated by arrow 22R. 

These M pattern positions will be described as die "mask range" 23 of flie particular 
text character 21. Fbr any given value of die text character 21, it can Uien be 
20 deternrined that tiie pattern can only occur in die text at fliose pattern poationswifliin 
the mask range 23 of die text diaracter 2 1 for whi(* die pattern character aUgned ^ 
tiie aforesaid particular text diaracter 21 has die same diaracter value as diat text 
diaracter. This knowledge is cncqisulated in die shift mask table referred to abov^ 
will now be more fufly explained widi reference to Figure 3. 

The shift mask table contains one shift mask for each diaracter (diat is, duiracter 
value) in die set of diameters from which die text can be composed. Each shift mask 
consists of M binary values called "mask bits". Eadi mask bit in a shift mask 
corresponds to one pattern position widiin die mask range of a text character, and 
indicates whedier a pattern occurrence at tiiat pattern position rdative to die text 
diaracter is consistent widi die text diaracter value bdng die same as die character 



25 



30 
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value to which the shift mask relates. Figures shows for each of three possible values 
of the test character 21 (namely, the values "A", "B" and "C"), the possible positicms 
of pattern occurrence relative to occurrences of that character in the text; in additicm, 
Figure 3 also shows the corresponding shift mask 31, 32, 33 for each of Ae three 

S diaracter values. In Figure 3 a tick withm a shift mask indicates that the position of 
the tick corresponds to a possible position of a pattern occurrence, and a cross iixiicate 
that the corresponding pattern position is not a possible pattern occurrence. The 
rightmost mask bit in each shift mask corresponds to the rightmost pattern po^cm 
within the mask range, in other words the pattern position for which die leftmost 

10 character of the pattern is aligned with the particular text character. Hie leftmost mask 
bit in each shift mask corresponds to the leftmost pattern position witiiin the mask 
range, which is the pattern position for which the rightmost character of the pattern is 
aligned with the particular text character. The shift mask for any chosen character 
value can be computed by reversing the order of the pattern right to left, and tiiea 

15 siting each mask bit of the shift mask to indicate ^niiether the corresponding character 
in the reversed pattern is equal to the chosen character value. 

Of course, for characters (character values) that are not' present in tiie pattern, the 
corresponding shift masks each have all thdr mark bits set to indicate that a pattern 
20 occurrence is not possible. 

The substring search method also maintains a "current shift mask", which records for 
the current pattern position which text characters positions within the current text range 
could be pattern positions corresponding to a pattern occurrence in the text The 

25 current shift mask consists of M binary bits, each bit corresponding to one pattern 
position within the current text range. The leftmost mask bit in the current shift mask 
corresponds to the current pattern position, and the rightmost mask bit in the current 
shift mask corresponds to tiie pattern shifted (M-1) characters to tiie right of its current 
position, so that the leftmost character of the pattern would align witfi the same text 

30 character that the rightmost character of the pattern aligns with at the current pattern 
position. Positions of possible pattern occurrences outside the current text range do 
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not need to be lecarded. Possible pattern occunences to "the left of ttie cuneot text 
imge have already been diecked, and at the current pattern poation flie se^ 
has not tested any teact characters diat affect the posability of pattern occurrences to 
the ri^t of the cunent text range. All pattern occurrences to flie right of the current 
5 text range are vcgaided as possible. 

The current shift mask is tfie basic indicator of pattern matching posabilities during 
execution of the search mediod, the current shift mask being updated for each new text 
character examined by combining dements of fte corresponding character shift mask 
10 with flie current shift mask. A general description of the opoalion of the search 
mrthod wiU next be given folkiwed by a specific example related to die pattern already 

illustrated in Figures 1 to 3. 

Once the pattern and text strings have been read in and die shift mask table has been 
15 computed, die search method begins to test the text for occurrences of the pattern. TTie 
method tests for occurrence of the pattern with die leftmost character of the pattern 
aligned witii the leftmost character of the text before any other alignment is tested. 
Nodiing is yet known about die contents of die text, so die current shift mask is 
initialised to indicate tiiat patten occunences are possible at all pattern poations. 

20 

The test for occurrence of die pattern at die first pattern position begins by reading die 
value of die text character tiiat is aKgned widi die first pattern character in die pattern 
test sequence. The text character is used to retrieve die shift mask corresponding to 
tiiat character ftom die shift mask table. This shift mask indicates die possible 

25 positions of pattern occurrences relative to die text character tested. Of course, die 
tested character may, as in die present case, be located in die text such diat some of 
die pattern positions to which die corresponding shift mask relates are positioned to die 
left of die current pattern position and diercforc of no relevance; however, since die 
tested character will be one lying widiin die current text range, dierc will always be 

30 a portion of die shift mask which is relev^t to determining die possibflity of pattern 
occurrences within the current text range. 
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The search method therefore now proceeds by talcing thtf relevant portion of the shift 
mask of tfie tested character and combining it appropriately with the current shift mask 
so that indications in the two masks relating to the same pattern positions are brought 
together to generate a new current shift mask giving an updated indication on the 
5 possibility of pattern occurrences at the pattern positions embraced by the current shift 
mask. 

The process of selecting the relevant poitiOT of the shift mask to be combined with Ae 
^ current shift mask may be viewed in terms of initially aligning the two masks and flien 

10 leftward shifting the shift mask of the tested character rekti veto the cunen 

until corresponding bits in the two masks refer to the possibility of pattern occurrence 
at the same pattern positions. The number of mask bits of shift required depend on the 
position of the text character read within the current text range. The bits of the shifted 
shift mask which align with bits of the current shift mask form the aforesaid relevant 

15 portion of the shift mask. 

In practice, it is convenioit to initially align the shift mask of the tested character with 
the current shift mask and then transform the shift mask mto a new mask (herein, a 
translated mask) also aligned with the current shift mask, by shifting the shift mask an 

20 appropriate number of bits leftwards with bits shifted beyond the mask boundary being 
discarded and vacated positions at the righthand end of the mask all bring set to 
indicate possible pattern occurrence. Thoeafter. corresponding bits in Ae translated 
mask and current shift mask are logically combined to give a new value of the current 
shift mask, a pattern occurrence at each pattern position only bring possible if it was 

25 possible before the most recently text character was read and if the translated mask 
indicates that the pattern occurrence at that pattern position is consistent with the newly 
known presence of the tested text character in the text. The logical combination could 
be implemented by bitwise ANDing or ORing of the two masks, depending on how the 
possibility of pattern occurrence is coded into binary bit values. 

30 
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The new value of the cunent shift nask might or might'not mdicate that a patteni 
occuneace at the cunent pattern position is possible. If a pattern occunence at the 
cunent pattern position is possible, then the last text character tested must have 
malchedthepattemcharacterwilhwhichitisaligned. In that case the seaich method 
5 proceeds to use the pattern test sequence to access further text charactenfiar testing as 
described above. If an text characters in the current text range are tested and a pattern 

occunence at flie cunent pattern position is still indicated as being possible in the 
cunent shift mask, then a r-iem occurrence at die cunent pattern position has i^ 
been found, and the current pattern position is returned. If die current shift mask 
10 indicates that a pattern occurrence at the cunent pattern position is not possible at any 
stage before pattern occunence is confinned, then the search mediod proceeds to test 

for pattern occunence at Ae next possible pattern position. The next possible pattern 
position is indicated by die leftmost btt of the cunent shift mask Uiat indicates possible 
pattern occunence. The position of this leftmost bit within die current shift mask 
15 indicates the number of characters by which die pattern has to be shifted to die light 

along die text (called die "pattern shift" hoe). 

Hgure 4 illustrales die way in which die search mediod uses die cunent shift mask. 
In die situation shown in Figure 4. die pattern 12 is cunently at position 13 in die text 

20 11. Imagine dial when die pattern first moved to diis position die initial cunent shift 
mask (CSM) 50 indicated tiiat an pattern positions in die current text range 14 were 
possible positions for pattern occunences. Now let is be assumed tiat text character 
41 is tested first, and is ftnind to be an "A". TTie shift mask 31 ftir A (see Rgure 3) 
is retrieved and has to be shifted two bits to die left to reflect die position of character 

25 41 as being two characters to die left of die rightmost character in die cunent text 

range. The resultant trandated mask 51 has its two rightmost bits (shown bracketed) 
set to indicate possible pattern occunences at die corresponding pattern positions. Tlie 
initial cunent shift mask 50 and die translated mask 51 are now combined to fonn a 
final cunent shift mask 52 for die character 41 test The leftmost bit of mask 52 
30 indicates fliat die cunent pattern position is a possfele position of a pattern occunence 
(because text character 41 matched die conesponding character in die pattern 12); in 
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other words, the pattern shift is found to be zero. The search is therefore continued 
widiout shifting the pattern position, the next step being to test die next text character 
in the pattern test sequence -character 42 in this example. Because the pattern position 
is undianged, die initial current shift mask S3 present at die start of character 42 
5 testing, is the same as the final current shift mask 52 for character 41 tesdng. On 
testing, text character 42 is found to be a "C" , and die shift mask 33 for C (see Figure 
3) needs to be shifted by zero bits to form the corresponding translated mask 54 (this 
is because text character 42 is zero bits to the left of the rightmost character in the 
current text range - it is the rightmost chaiBcta). The initial current shift mask 53 is 
10 dien combined widi die translated mask 54, marking pattern portions as posable 
positions of pattern occurrences only when both masks 33 and 43 indicated that the 
position was a possible patton occurrence. The result is the new current shift mask 
55 (die final CSM for character 42 testing). 

15 lluscurrent shift mask 55 is then processed to find the value of the patteni shift which 
is five in this exarnple. Hie current pattern position must now be shifted by five to the 
position of character 41 before testing is continued ^t will be seen showing (hat diis 
is indeed a posnble position for a pattern occurrence, as both text characters 41 and 
42 match die pattern at diis new pattern position). The current shift mask must also 

20 be correspondingly shifted to form die initial current shift mask 56 for die next 
character test, the appropriaie shift being five bits to the left to bring the leftmost bit 
indicating possible pattern occurrence (bit 57) to the leftmost bit position of the current 
shift mask56. Tlie bits 58 shifted into die right hand end of die current shift mask are 
all set to indicate that pattern occurrence at the corresponding pattern positions is 

25 possible, reflecting die feet that nofliing is yet known about the corresponding parts of 
the text. This shifting procedure preserves all information about possible positions of 
pattOTi occurrence that is still of potential value - in this example die two bits 59 of the 
final current shift mask 56 indicate two pattern positions at which there cannot be 
occurrences of the pattern in die text. After the new current shift mask 56 is formed, 

30 die pattern test sequence is re-initialised, and testing continued at the new pattern 
position beginning by reading the value of the text character that is now aligned with 
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10 



the first pattern character in the pattern test sequence, Itie procedure is now as 
previously described, the search method terminating eittier on finding a pattern 
occuneoce or on reaching die end of tiie tsxL 

Hgures 5. 6 and 7 are flow diagrams iUustrating one possible implementation of the 
substring search method in program form. In the flow charts of Figures 5 to 7, the 
notation "A< <B" is used to indicate that element A has been shifted to the left by B 
bit positions with the rightmost bits of element A being set to indicate the possiWli^ 
of pattern occurrence ("OK"). 



In the implementation illustraled in Figures 5 to 7. the pattern test sequence (that is, 
the order in which characters in the current text range at a particular pattean position, 
are tested) is determined by a one dimensional array PATrERN_TEST_SEQTJENCE, 
the value of tiie ntii dement ofyvbkh indicates tfie location in tiie current text range 
15 of die ntfi character to be tested. The location in the current text range corresponding 
to flie current pattern position is assigned the value zero with successive locations 
having a succesavely incremented value. 

Figure 5 is a top level flow chart of flie substring search method, tiie first sttp 61 of 
20 tius method being to read in die text and pattern. TTie next stq> 62 is to generate flie 
shift mask table for an die characters making up die text string; diis table is constittited 
by an array CHAMkiASK D where CHARMASK IN] repnjsents die shift mask fiir ti^ 

character "N". The generation of die array CHARMASK Q is shown in more detail 
in Figure 6 to be described more My hereinafter. Tlie generation of die shift mask 
25 table is followed by a step 63 in which a variable CDRRENT_SHIFT_MASK 
representing die current shift mask is initialized widi all its bits set to rqnesent possible 
pattern occurrence ("ALL OK"); at die same time a variable TEXT_POINTER 
rqjresendng pattern position is initialized to die start of die text string. 

30 The search mediod dien enters a double looped program structure, die outer loop of 
which (including steps 64 and 72) is concerned widi advancing die scan* mediod along 
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the text, and the inner loop of which including steps €l and 70) is concerned with 
testing for a match at the cunent pattern position. More particulariyp stqi 64 ascertains 
whether there is text still left to be tested and if not, the result is returned at stq> 65 
that no pattern has been found. However, if text is still left to be tested then a variable 
S PATTERN_POINTER is initialized to one, this variable being used to indicate tfie 
number of the character next to be tested relative to how many characters have already 
been tested at the current pattern position. Thereafter, the inner loop concerning 
testing for a match at the current position is entered. The first step of the inner loop 
is to ascertain whetter there are pattern positions still left to be tested (step 67) ; if this 

10 is not the case, then this will be because the whole pattern has been tested and found 
to match so tiiat a result is returned at stq> 68 that a pattern has been found at the 
pattern position rq>resented by the variable TEXT_POINTER. However, if diere is 
still pattern left to test, then a next character within the current text range is selected 
(using the variable PATTERN_POINIER to access the pattern test sequence held in 

15 the array PATreRNTEST^SEQUENCE), is tested and the resultant pattern shift 
determined, all this being done in step 69. Hie details of stq) 69 are shown in Figure 
7, to be described more fully herdnaft^. 

If the determined pattern shift is zero (as tested at stq) 70) then this means that for the 
20 tested character, a pattern occurrence at the current pattern position is possible so that 
testing can continue at the current pattern position; to this end the variable 
PATTERN_POINTER is incremented to indicate the number of the next character to 
be tested (relative to how many characters have so fai been tested at the current pattern 
position). After the variable PATTERN^POINTER has been incremented, stq> 67 is 
25 re-entered. 

If the test carried out in program step 70 indicates that the pattern shift returned after 
a character test is non zero, then program step 72 is carried out to advance the pattern 
to a new pattern position by changing the pattern position pointer TEXT_POINTER 
30 and, at the same time, appropriately shifting the contents in the current shift mask. 
Following step 72, the search method returns to step 64. 
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Considering now tbc flow cbait of Figure 6 fflustiating the genetalion of the shift mask 
table, the fiist operation is to initialize aU entries of the shift mask table anay 
CHARMASK [CHAR] to a "NOT OK' setting, this being efifiKted by 
(the size of tiie anay being deteraoined by the number of characters in ti» a^diabet 
5 fiom which the text string is composed). RjUowing anay inirialisaHo a, variables 
PATTERNLOCATTON and MASK_POSrnON are also initialised, the variable 
PATTERN LOCATION being set to reliacence die fimpatton character position and 
the variable MASKPOSmON being set to the pattern lengtfi. 

10 A loop is flienentaed to generate die shift masks on the basis of die characters in die 
pattern. Hie first step of diis loop is to retrieve die pattern character at die location 
indicated by die value of die variable PATTERN.LOCATION (stq> 85). Next, die 
shift mask of die retrieved pattern character is modified by die setting of die bit at 
position MASKPOSmON within die mask to "OK" {step 86). 

15 

If die retrieved diaracter was die last character of die pattern (stq> 87), generation of 
the ^ mask table is now complete. However, if die end of die pattern has not been 
reached flien die variable PATTERN_LOCATION is incremented and die variable 
MASK_POSrnON is decremented (step 88) before control is returned to step 85. 

20 

Considering next die flow chart of Figure 7 which shows die details of die character- 
test and shift-calculation process iqircsented by step 69 in Figure 5, die first stqj of 
diis process is to set die value of a variable PATTERN_LOCATION to die value of 
die location of die character in die current text string next to be tested, tiiis location 

25 value bong determined by using die variable PATIERN_POINTER to access die amy 
PATTERN_TEST_SEQUENCE (step 90). Next, die identity of die character at die 
text position offset from die current pattern position Cmdicated by TEXTPOINTER 
by die value held in PATTERN.LOCATION, is assigned to a variable TEXT_CHAR 
(step 91); tills is die text character next to be tested against die pattern by use of die 

30 conesponding shift mask. To tiiis end. die corresponding shift mask CHARMASK 
ITE5Cr_CHAR] is retrieved and die translated mask appropriate for die position of die 
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character bring tested is formed by shifting the retrieved shift mask to fte left by a 
number of bits corxesponding to the value (PATTERN^LENGTH - 
PATTERN^LOCATTON) with the bits shifted into the righthand end of the marie bring 
set to indicate the possibility of pattern occurrence. 

5 

Thereafter, in step 93 a new current shift mask is formed by combining the old current 
shift mask (CURRENT^SHIFT^MASK) with the translated mask 
TRANSLATED^MASK). Finally, in step 94, the pattern shift value SHIFT is 
determined by examining the newly formed current shift mask to ascertain the distance 
10 ftom the left of the mask of the first "OK" position. If there arc no "OK* bits in the 
current shift mask, then SHIFT is set to the value of PATTERN^LENGTH. 

It will be^reciated that the method of Figures 5 to 7 could be effected by a hardware 
equivalent to the illustrated program form. 

15 

The operation of the substring search method as described above is only illustrative of ^ 
the present invention. Many variations of the method are possible. Some of these 
variants will now be described. 

20 The pattern testing order, which is the order in which the text characters within the 
current text range are selected for reading, can be precomputed so as to obtain the 
maximum statistically expected pattern shift, considering the relative frequency of 
occurrence of the different pattern characters in normal usage of the alphabet 
Optionally the relative positions of repeated characters within the pattern string can also 

25 be considered to obtain a more precise optimi^on. 

The pattern testing order could be dynamically optimised according to the pattern 
positions tiiat are still possible positions for pattern occurrences at any stage. This 
optimisation may be too complex to be of practical value however. 

30 
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The pattern testing order can be determined in any way which provides a suitable 
perfoimance for the particular s^Iicadcm of the seardiing m^od* 

A record of text cfaaracters already tested could be maintained so as to avoid retesting 
5 diem. A text diaracter that has still to be tested according to the pattem testing 
sequence at Ifae current pattem position migjit have been tested at a previous pattem 
position. If so, that text character must match die corresponding pattem character at 
the current pattem position, and so die text character does not have to be retested. The 
record of text diaracters already tested could take die form of a mask with a bit to 
10 indicatewhetiiereachcharacterin die current text range had already been read. This 
mask would need to be shifted whenever die pattem and hence die current text range 
was shifted, and the appropriate bit of the mask would need to be set iR^enever a text 
character was read. 



15 If the substring search mettiod of the invmtion is implemented in computer software 
the various masks are most efiBcienfly held in a single computer word each. This 
would imply diat the masks have a limited lengfli, say W bits (32 bits corresponding 
to patterns of up to 32 characters would be a common value for W using present 
computer technotogy.) Longer patterns might be handled eidier by using multiple 

20 computer words for each mask, or by maintaining masks tfiat only rqwesent die 
leftmost W characters at any pattem position and so only allow the pattem to be shifted 
by a maximum of W bits ovm* die text between accesses to the text 

The pattem shift can be obtained from the current shift mask in various ways. If the 
25 search mefliod of the invention was embodied in electronic hardware then dedicated 
circuitry could be used. If die search mefliod was implemented in conqiuter software 
dien either a precomputed shift table can be used to relate die contents of die current 
shift mask (or parts of the contents of die current shift mask) to pattem shift values, 
or alternatively a binary interpolation mediod could be used to find die correct shift 
30 value* Botii of these techniques and odiers are well known in die prior art, indeed, the 
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instruction set of some computers includes an instruction that allows the position of the 
leftmost one bit in a word to be determined directly. 

In certain circumstances (for example, in searching strings made up from the bmary 
S character set of 0,1), it may be advantageous to provide shift masks in respect of 
combinations of characters and then to check for matches in terms of these 
combinations; thus for the binary character set case, character combinations could used 
that constituted all possible 256 combinations of eight binary characters. 

10 It will be appreciated that the substring searching method can be used where the given 
pattern includes a wildcard indicator indicating that the character at the correspcmding 
posidon in the pattern may be any of a group of characters (which could be the whole 
character set or a specified subset of characters); in this case, the matching of character 
items with characters of said pattern will take account of whidi characters are included 

15 in said group. Furthermore, the method can also be used where the pattern includes 
a wildcard indicator indicating that die pattern includes an unspecified s^ment of one 
or more characters in length; in this case, the method of the invention would first be 
used to locate the pattern portion prefixing said segment and then implemented again 
to locate the pattern portion following the segment, this second search of the method 

20 being started from the end of the substring located by the first search. 
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CLAIMS 

1. A method of locating witbin a character string a substr^ 
character patton of one or mote characters, tiat characters making np fl» string and 
5 patteni being taken fitom the same character and said m^hod conq»ising a 
|iT rif miwat y phasat dnrinp vbkh pattem-diift data is derived fiom said gven pattern, 
ami a main phase in vi«uch die patteni is notionaUy positioned relative to said string at 

a succesrion of different ctment pattern positions in each of which one end of die 
panan conesprajds to a respective cfaaiacter position in said string and said pattern 
10 extends along die string dierefinom in a predetBrmined direction, die mam phase 
inpJii(iing die steps of : 

(a) died^g wifli die pattern in its current pattern position, for any mismatch 
between die pattern and die corresponding substring of said string, dus stq> 
being fiist catried out widi die pattern in an initial current pattern position in 

IS ^i^iicfa sdd one end of die pattern corresponds to one end of die Aiing, and 

(b) where step (a) ascertains a mismatch, notionally shifting die pattern along die 
string in said predetermined direction, by an amount derived from said pattern- 
shift data, to a new c uii e nt pattern position, 

steps (a) and (b) being rqieated as necessary until a match is fiwnd or die string has 
20 been fully seardied; characterised in that: 

- said pattran-shift data is in die form of a respective possible-matdi record for each 
of a plurality of different character items of said character srt where a character item 
may comprise a angle character of said character set and/or combinations of such 
characters and said plurality of items is such as to enable any possible string to be buUt 

25 up fiom combinations, including repeats, of said character items, 

- each posable-mateh record indicates, for an assumed atuation of die assodaled said 
character item being present in die string and said one end of die pattern being moved 
in said predetermined direction towards die character item, wheflier said character item 
matdies die corresponding character or characters of said pattern for each of die N last 

30 pattern positions taken by said pattern as it? said one end is moved up to said character 
item, N bang a poritive integ», and 
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- the tnatn phase involves detennining, finom the lespecSive possible-match leooxd of 
at least one character item of die substring that corcesponds to said pattern in its 
current position, the next pattein position at which the presence of the, or all of the, 
said at least one character items is compatible with a match between die pattern and a 
S substring of the string, this determination taking account of the position in said string 
of the or each said at least one diaracter item relative to said one end of the pattena, 
by using an appropriate portion of the character item's possible-match record. 

2. A metiiod according to daim 1, wherdn: 

10 step (a) of the main phase involves idmtifying the character item present at a 

particular location in the substring cunendy corresponding to said pattern, and 
checking whedier this character item matches the character or characters in the 
corresponding location in the pattern, these operations bdng repeated for different 
locations in said substring until either a mismatch is foimd or the whole substring has 

15 been successfully matched to the pattern; and 

step (a) also includes providing a current possible-matdi indicator giving a 
cumulative indication of which pattern positions yet to be reached, provide the 
possibility of a match, this cumulative indication being provided by the incorporation 
into said indicator, for the or each diaracter item identified, of the relevant portion of 

20 its possible-match record, said current possible-match indicator being used to determine 
said next pattern position to which the pattern position is shifted in step (b) of the main 
phase, 

3. A method according to claim 2. wherdn: 

25 - the said current possible-match indicator includes a possible match indicaticm 
for the current pattern position as well as for pattern positions yet to be reached, this 
indication being cumulatively derived from the possible^match records of the or each 
character item identified in said substring in said step (a); 

said current possible-match indicator is updated, upon a said character item 

30 being identified, witii die possible-match indications in the possible-match record of the 
character item, that relate to the current and yet to be reached pattern positions; and 
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the said operation Of Checking Whether an identified characte 
ciiaracter or cbaiacten in flie corresponding location in the patton, is efBxted by 
reference to tiie said current possible-match indicator. 

5 4. A metiiod according to daim 2 or 3, \(*eiBin upon die pattern bang shifted to 
a new canent pattern position in stq> (b) of tfie main phase, said current possible- 
matdi indicator is reset 

5. A method according to daim 2 or daim 3, wherein upon die pattern bang 
10 shifted to a new current pattern position in stq) (b) of the main phase, said current 

posable-match indicator is modified to retain only tfie portion tiiereof relevant to the 
pattern positions not y^ passed. 

6. Ametiiodaccoidingtoany oneofdaims2tD5, wheieineach saidpossible- 
15 inatdJ record and said current possible-inalch indicator takes tiie form of an N-bitfidd 

in which each bit position indicates whetiier a match is possible for aiespective pattern 
position, die updating of said current possible-match indicator bang efifected by bit 
shifting and/or logically combining die relevant bits of a possible-match records widi 
corresponding bits of said indicator. 

20 

7. A mediod according to daim 1, wherean N has a value equal to die character 
length of said pattern. 

8. A metiiod according to daim 1, wherein said given pattern includes a wildcard 
25 indicator indicating tiiat die character at die corresponding position in die pattern may 

be any of a group of charactas, die matching of character items widi characters of said 
pattern taking account of which characters are included in said group. 

9. A mediod according to claim 1, wherdn: 

30 - step (a) of die main phase involves identifying die character item present at a 
particular location in die substring currendy corresponding to said pattern, and 
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checking whether this character item matches the character or characters in tiie 
corresponding location in the pattern, these operations being repeated for different 
locations in said substring until either a mismatch is found or the v^Ie substring has 
been successfully matched to the pattern; and 
5 - the said operation of checking whether an identified character item matches die 
character or characters in die corresponding location in the patton, is effected by 
reference to the possible-match record of the character item and, in particular, to an 
indication omtained in the record that relates to the current pattern position. 
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